Skip to content

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Jul 29, 2025

✨ Description

Decouple the block interface from the transformer. This will make the SSM interface cleaner, improve readability, prevent bug, simplify the implementation of future mixers, etc. It's also a step towards merging SSMs into the GPT model, and varying block configurations (#242). I also included some groundwork for these tasks, so that upcoming PRs are smaller and simpler.

This PR is pure refactor, and changes that would cause backward compatibility concerns are left for future work.

  • Create a block submodule and move everything non-specific to transformers there from the transformer directory.
  • Extract base classes BlockConfig, MLPConfig, AttentionConfig from TransformerConfig. Using inheritance for backward compatibility, but composition would be preferable.
  • Extract base classes BlockDimNames, BlockKwargs from their transformer counterparts for common variables. Use inheritance in their specializations to simplify usage. (Ex. we can systematically use SSMDimNames, SSMKwargs when dealing with SSMs.)
  • Remove TensorSpace entirely, instead create tensor dimensions on the fly. This will both make things simpler and prevent unwanted conflicts between layers with different configs.
  • Remove unused fields per_layer_lr_scale(from incorrect inheritance of BlockConfig) and normalization from SSMConfig, which would cause unexpected behaviour if defined.
  • Make a layer debugging utility to merge the various debug calls and help add new ones.
  • Move initialization to its own file.
  • Lots of misc interface fixes and improvements.

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

@jlamypoirier jlamypoirier mentioned this pull request Aug 21, 2025
8 tasks
@jlamypoirier jlamypoirier marked this pull request as ready for review August 27, 2025 21:57
Copy link
Collaborator

@tscholak tscholak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Base automatically changed from tp_mamba to main September 18, 2025 00:24
@jlamypoirier jlamypoirier merged commit cfe8d96 into main Sep 18, 2025
2 checks passed
@jlamypoirier jlamypoirier deleted the block_interface branch September 18, 2025 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants